Vectorized and performance‐portable quicksort
نویسندگان
چکیده
Recent works showed that implementations of quicksort using vector CPU instructions can outperform the non-vectorized algorithms in widespread use. However, these are typically single-threaded, implemented for a particular instruction set, and restricted to small set key types. We lift three restrictions: our proposed vqsort algorithm integrates into state-of-the-art parallel sorter i p s 4 o $$ ip{s}^4o , with geometric mean speedup 1.59. The same implementation on seven sets (including SVE RISC-V V) across four platforms. It also supports floating-point 16–128 bit integer keys. To best knowledge, this is fastest sort large arrays non-tuple keys CPUs, up 20 times as fast sorting standard libraries. This article focuses practical engineering aspects enabling speed portability, which we have not yet seen demonstrated implementation. Furthermore, introduce compact transpose-free networks in-register arrays, vector-friendly pivot sampling strategy robust against adversarial input.
منابع مشابه
A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake
The modern CPU’s design, which is composed of hierarchical memory and SIMD/vectorization capability, governs the potential for algorithms to be transformed into efficient implementations. The release of the AVX-512 changed things radically, and motivated us to search for an efficient sorting algorithm that can take advantage of it. In this paper, we describe the best strategy we have found, whi...
متن کاملQuicksort Revisited - Verifying Alternative Versions of Quicksort
We verify the correctness of a recursive version of Tony Hoare’s quicksort algorithm using the Hoare-logic based verification tool Dafny. We then develop a non-standard, iterative version which is based on a stack of pivot-locations rather than the standard stack of ranges. We outline an incomplete Dafny proof for the latter.
متن کاملQuicksort asymptotics
The number of comparisons Xn used by Quicksort to sort an array of n distinct numbers has mean μn of order n log n and standard deviation of order n. Using different methods, Régnier and Rösler each showed that the normalized variate Yn := (Xn−μn)/n converges in distribution, say to Y ; the distribution of Y can be characterized as the unique fixed point with zero mean of a certain distribution...
متن کاملVectorized Cluster Search *
Contrary to conventional wisdom, the construction of clusters on a lattice can easily be vectorized, namely over each “generation” in a breadth first search. This applies directly to e.g. the single cluster variant of the Swendsen-Wang algorithm. On a cray-ymp, total CPU time was reduced by a factor 3.5 – 7 in actual applications. ∗Submitted to Computer Physics Communications
متن کاملResilient Quicksort and Selection
We consider the problem of sorting a sequence of n keys in a RAM-like environment where memory faults are possible. An algorithm is said to be δ-resilient if it can tolerate up to δ memory faults during its execution. A resilient sorting algorithm must produce a sequence where every pair of uncorrupted keys is ordered correctly. Finocchi, Grandoni, and Italiano devised a δ-resilient determinist...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Software - Practice and Experience
سال: 2022
ISSN: ['0038-0644', '1097-024X']
DOI: https://doi.org/10.1002/spe.3142